mps: add nf4 dequantize/quantize kernel #1790

bghira · 2025-10-29T15:09:51Z

this ports the CUDA NF4 support to Metal.

so far, I've targeted nf4 quant/dequant because it's one of the least-accessible formats for Mac users.

we're using uint8 under the hood. for what it's worth, Metal (and the underlying hardware) lacks fp8/fp4 support.

performance has not been the forefront of this effort, as most of the time was spent determining how to plug metallib into bitsandbytes and correctly build it.

I'd like some feedback on this approach, because due to my inexperience with your build toolchain, it's highly likely I've done things in ways that can be improved.

I'm building on lessons I'd learnt while building a pytorch custom op for universal-metal-flash-attention, namely the way the MTLBuffers are retrieved from torch MPSGraph objects, which required the use of the torch headers.

bghira · 2025-10-29T16:14:56Z

@rickardp cc

bghira · 2025-11-11T16:33:06Z

@matthewdouglas if we can get this reviewed and merged, i can continue adding Metal support.

matthewdouglas · 2025-11-12T17:41:33Z

Hi @bghira

For the time being, I'd like to avoid linking in libtorch. Doing so adds more complexity to our packaging process, and IMO it is not worth it for only this. We aim to support a reasonably broad range of PyTorch versions; with linking to libtorch we'd have to either build for each of those and somehow distribute them all, or pin the version of PyTorch.

Eventually we'll consider using the newer LibTorch Stable ABI, but that would be quite a while away, and we're only going to do it if there's very clear benefit to doing so.

Instead, I ask that native code is built independently of PyTorch, and ideally exposes a C API that can be used in the same way that the CPU, CUDA, ROCm, and XPU backends work. Ideally we would not build as a Python extension that links to cpython either, but if so, I would also ask that it uses one build for all Python 3.10+ versions, e.g. with the Stable ABI.

I'm not super familiar here, but it seems the main things we need out of Torch are:

torch::mps::get_dispatch_queue()
torch::mps::get_command_buffer()
torch::mps::commit()

It would be great to find a way around that. As mentioned on Discord, torch.mps.compile_shader() may also be an option. Do note that you can still write the shader in a separate file and read it in.

I understand that this is a little different then most PyTorch extensions with native code, but it's intentional to help us with distribution and broad compatibility.

bghira · 2025-11-12T17:44:23Z

understood; Metal support is basically not possible to take any further in this library without substantial performance drawbacks.

mps: add nf4 dequantize/quantize kernel

b09241c

matthewdouglas added the macOS label Oct 29, 2025

matthewdouglas self-assigned this Oct 29, 2025

matthewdouglas self-requested a review October 29, 2025 17:44

bghira added 2 commits October 29, 2025 22:40

Merge branch 'main' into feature/mps-nf4

679d517

Merge branch 'main' into feature/mps-nf4

2bac93a

bghira closed this Nov 12, 2025

bghira deleted the feature/mps-nf4 branch November 12, 2025 17:52

bghira mentioned this pull request Nov 12, 2025

Enable Ascend NPU Backend with Custom Ops Integration for NF4 Support #1695

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

mps: add nf4 dequantize/quantize kernel #1790

mps: add nf4 dequantize/quantize kernel #1790

Uh oh!

bghira commented Oct 29, 2025

Uh oh!

bghira commented Oct 29, 2025

Uh oh!

bghira commented Nov 11, 2025

Uh oh!

matthewdouglas commented Nov 12, 2025

Uh oh!

bghira commented Nov 12, 2025 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

mps: add nf4 dequantize/quantize kernel #1790

mps: add nf4 dequantize/quantize kernel #1790

Uh oh!

Conversation

bghira commented Oct 29, 2025

Uh oh!

bghira commented Oct 29, 2025

Uh oh!

bghira commented Nov 11, 2025

Uh oh!

matthewdouglas commented Nov 12, 2025

Uh oh!

bghira commented Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

bghira commented Nov 12, 2025 •

edited

Loading